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1. INTRODUCTION 

The detection of moving objects in a video sequence is an important task in computer vision. It, 
actually, plays a very important role in a video surveillance system because the resulting detection influences 
all the later steps. Note that the step of detecting can be very complex due to the presence of disruptive 
elements in the environment of the object. In fact, factors such as weather conditions, changes in lighting of 
the scene, the presence of shadows or moving objects in the scene (movement of branches of a tree, window 
movement, a computer screen, etc ...) may negatively influence the detection process. 

To make up for these problems, several approaches have been proposed in the literature. These 
include the background of modeling methods that can be classified into several categories: basic, statistical, 
vague, etc. [1]. A thorough analysis of these methods has demonstrated that these statistical methods are 
generally more robust to illumination changes and dynamic background [2]-[4]. 

The principle of detecting moving objects in a video sequence is based on a classification of the 
pixels of the image in the foreground (mobile) and background (static). Given that the background of a video 
sequence often contains not static objects such as the branches of trees, we have chosen to use the Gaussian 
mixture method (MOG) for the background modeling as it is the best adapted to such situations [1], [5]. Note 
that the original MOG method [2] uses components of the RGB color space that are very sensitive to changes 
in lighting and are not independent. This is why in some previous works, we preferred to use other color 
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spaces that are more robust to changes in lighting while having independent components. The areas of most 
used colors in this work are the normalized RGB [6], YUV [7], [8], HSV [8] and LUV [9]. 

Among the important issues encountered when detecting moving objects is the impact of the shadow 
whose presence in an image can change the perceived shape and color of the objects. Unfortunately, the 
points of objects and those associated shadows share two important visual characteristics: the movement and 
the shape [10]. Therefore, whatever is the update of the background, the points of movement corresponding 
to objects and shadows are detected simultaneously while being grouped. This greatly alters the shape of the 
detected silhouettes. This problem negatively affects several tasks that fall within the detection, namely the 
tracking and the classification algorithms as well as the evaluation of the position of the moving objects. To 
make up for these limitations, we start from the assumption that the shaded areas can be detected by choosing 
a color space possessing a better separation between the chroma and intensity than that of the RGB space 
[11]-[14]. The choice of the HSV colors space is motivated by its capacity to separate the intensity 
component (V) of the chromatic components (H and S) [15]. However, when the value of saturation S of a 
pixel is below a certain threshold, this pixel is considered as "achromatic" and its components saturation S 
and hue H are not anymore taken into account. Consequently, our analysis is based only on the component 
intensity V to decide if this pixel belongs to a new element or to a background. The HSV color space 
separates good chroma and intensity [16]. 

In what follows, to detect the shadow in a video sequence, we used the available information on the 
color considering the HSV space. A shaded background in principle should have an identical color with 
lower brightness. Note that there are methods that have been proposed to detect and remove the shadow by 
considering the HSV space. However, these methods use only static thresholds to separate the shadows of the 
foreground [12], [16], [17]. But the more we gradually we advance in time, the more difficult the change in 
lighting makes to remove shadows correctly when thresholds are static. This is why we propose to rather use 
a dynamic threshold. Indeed, as the shadows are strongly related to lighting, it is possible to remove the 
shadow more correctly by changing the values of the threshold dynamically by taking into account the 
degrees of the shade and the noise. 

This paper is organized as follows: Section 2 is devoted to the presentation of previous work on the 
detection of moving objects and on the detection and removal of shadows. In Section 3, we present the 
method put forward in this work to detect and remove while recalling the principle of the statistical method 
used for background modeling the image. The results obtained by our method as well as a comparison of 
these results with those of the existing methods will be detailed in Section 4. Finally, in conclusion, a 
discussion about the performances of the proposed method will be given in the last section. This document is 
a template. An electronic copy can be downloaded from the conference website. For questions on paper 
guidelines, please contact the conference publications committee as indicated on the conference website. 
Information about final paper submission is available from the conference website. 


2. METHODS FOR DETECTING MOVING OBJECTS AND THEIR SHADOWS 

Given the importance of detecting moving objects in a video sequence, several approaches have 
been developed to provide robust methods to complex conditions (non-rigid objects, backgrounds dynamic, 
etc) [18], [19]. Given the multitude of methods proposed in the literature for the detection of moving objects 
and according to the works presented in [20]-[24], we can distinguish two major categories of detection 
methods: those with and without background modeling [1]. 

Detection techniques with no background modeling generally consist of performing a spatio- 
temporal difference operation of pixel intensity values which constitute the frames of the video sequence 
[25]. In their simplest form, these techniques are confined to using the pixels of two consecutive frames of 
the video. Although these techniques manage to extract most of the relevant pixels of moving regions, they 
are generally sensitive to the dynamic changes such as lighting or background image change [26]. Thus, more 
sophisticated methods which use statistical characteristics of each of the pixels were developed to make up 
for the defects of the detection methods without modeling the background. 

The first detection methods with background modeling were oriented modeling by a single Gaussian 
per pixel [27]. To account for the multi-modal aspect, Stauffer and Grimson [27] are among the authors to 
propose a Gaussian mixture model. Kim et al. [4] propose a "codebook" dictionary method which can 
manage a moving and noisy background but requires a learning step that can be long. As a general rule, the 
background modeling techniques are robust to noise and can get adapted to the presence of new objects in the 
background. However, they remain vulnerable to many environmentally induced phenomena such as 
shadows [10]. 

The detection of shadows has been the subject of several works. The proposed methods were used in 
different areas such as the recognition of moving objects, video surveillance and tracking of road traffic. 
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Andres et al. [28] undertook a comparative study of the recent methods that have dealt with shadow 
detection. They classified these methods into a taxonomy based on characteristics that include four 
categories: chromaticity, physics, geometry and textures. 

Other researchers concerned with the same issue also include Wenbo et al. [14] who proposed a 
model to predict the threshold used to detect the shadow, Cucchiara et al. [12] who put forward a technique 
using the information obtained in the HSV color space to eliminate the shadows where the observed values of 
the three components of the HSV space are compared to those of the model of the background. Moreover, 
Cucchiara et al. [11] described an extension of their method presented in [12] where a higher level of 
reasoning component classes regions as follows: background, shadow, phantom or ghost shadow, moving 
objects. 

In [10], the authors developed a shadow detection technique applicable in the context of subtracting 
the background. Their technique considers two measures: one characterizing the distortion of the brightness 
and the other color distortion. While detecting changes, both measurements are compared to thresholds to 
determine if it is Shadow or significant changes. This technique aims especially at reducing the calculation 
time. Nadimi and Bhanu [29] introduced a method that keeps the pixels corresponding to the shadows while 
eliminating others. Horprasert et al. [30] proposed a method based on RGB color space to classify the pixels 
into four categories: original background, illuminated and shadowed, and pixels movement. To this end, two 
steps are added to the method based only on the RGB color space: the chromatic distortion and the distortion 
of brightness. 

Prajakta et al. [31] proposed two techniques to detect the shadows outside images having a constant 
background while considering variable lighting conditions. In the first technique, they used a global 
thresholding considering the H and V values of the HSV color space to calculate the map report. Then, the 
threshold value to produce the binary image from the global image is determined using the method of Otsu 
[32]. The map of the gradient is found using the Sobel operator and the V component of the HSV space. 
Therefore, the shadow is obtained from the map gradient established using the threshold of the global image. 
In the second technique, the image in the HSV color space is converted into RGB color image so as to build 
the map report. After, the threshold value to produce the binary image from the image is determined using the 
method of Otsu. The regions have been labeled in order eliminate all small areas and obtain the local 
threshold using the method proposed by Otsu for shade. Finally, Priya and Kirtika [33] proposed a new 
algorithm to detect and remove the shadow shape still images from the outside. The proposed algorithm uses 
chromaticity to detect and remove the shadow. 


3. EXTRACTING OBJECTS FROM THE FOREGROUNG 

When extracting objects from the foreground [34], the method we are proposing in the present work 
aims to achieve two objectives. The first is to achieve a good detection of moving objects in the scene while 
the second is ensure that this detection makes no confusion between the detected objects with their shadows 
that are to be removed after. In addition, to be operational in any environment, the proposed method has to be 
robust to changes in brightness in the scene. In what follows, we present the method used for extraction of 
the foreground. 


3.1. Model of the background 

Several color models such as RGB, HSV, YUV and L*a*b spaces have been used for the statistical 
modeling of the background of a video sequence, in our case, we propose to use the HSV color space whose 
advantage lies in its invariance regarding luminosity. 


3.2. Modeling of the background in the HSV space colors 

To track and analyze the characteristics of an object (its movement, speed, trajectory, etc. ...), it is 
necessary to detect it. The basic detection technique models the background from multiple images that are 
acquired sequentially. 

The color characteristics refer primarily to components of color spaces which can be treated 
separately or jointly. Although the RGB space is the most widespread [35], some authors use other HSV 
color spaces [12], [36] while others [16] used L*a*b spaces. These spaces are more robust than the RGB 
space [11] because they can increase invariance with respect to changes of luminosity and lighting and also 
in relation to the presence of shadows. Note that a study has recently shown that [16] that the HSV color 
space is better than the L*a*b. 

To extract the pixels in the foreground, we propose to use, for each pixel of the back-ground, an 
adaptation of the modeling by a mixture of Gaussian (MOG) proposed by Stauffer et al. [2] which consists in 
changing the color space. Indeed, we propose to work in the HSV color space while being based basing itself 


Detecting and Shadows in the HSV Color Space using Dynamic Thresholds (Boutaina Hdioud) 


1516 O ISSN: 2088-8708 


solely on the intensity V to decide if the pixel belongs to a new element or to a background. Thus, the 
observations corresponding to a pixel which vary over time are considered as an X, process defined as: 
a. The X, process is initialized by recent pixel values: 


X, ={X1, ane Xi} with x=[H; S;, Vä 


Since we are interested only in the intensity component (V), we will only keep the coordinated V; of x;, 
that is to say x; = [Vj]. 

b. Each pixel of the reference image is modeled by a mixture of k probability densities. The probability of a 
pixel to belong to the background is given by: 


P(X )= Xo, nlx m2, ) 


k : represents the number of Gaussians used in the mixture. In our case, this number varies from 0 to 3 
instead of 3 to 5. 

@j, : is a weight assigned to each Gaussian representing the proportion of the data used in the calculation 
of the Gaussian at time t. 

n: is a multidimensional Gaussian function defined by an average vector u, and a covariance matrix Q: 


1 1 TAi 
nan O)=— mop] - -uyo -u 


Where d represents the size of the space and Q = o’°I with I is the identity matrix. Once the 
proposed model based on the MOG method is built, its update over time is performed using the expectance 
maximization algorithm (EM) through the proposed equations in [10]. Note that in order to remove isolated 
pixels so as to extract the moving objects, we made use of mathematical morphology techniques. 


3.3. Extraction of the foreground 

In our algorithm, the decision on the membership of a pixel of a given image to the background or 
element of the foreground is effected by calculating a Mahalanobis distance type [37]. This distance is 
calculated between the recent value of the pixel p and all the Gaussians for that pixel. We have a 
correspondence between the pixel value and the Gaussian calculated on the V component of the HSV color 
space if: 


[(X.. THU i Q, (X. ~H,, ) < 2.56, 


So we get the image of the foreground by applying a threshold on the weight of the corresponding 
Gaussian to determine if it matches the background or foreground. 


3.4. Detection and removal of the shadow 

Using the Gaussian technique mixture of Gaussian technique can detect any moving objects in a 
scene but does not make a distinction between the detected objects and their shadows. The purpose of this 
sub-section is to improve our system for detecting moving objects so as to enable to not classify the 
foreground elements as shadows. 


3.4.1. Removing shadows 

In many cases, detecting the shadow of an object is not always easy to achieve because the class of 
the points belonging to the object and the points corresponding to the shadow can have a similar visual 
appearance, especially when working with gray levels. The detection of the shadow is based solely on a 
syntactic discrimination between the appearance of shadows and objects in terms of brightness and color. In 
our case, we need to develop a process for detecting the shadow in order to remove it using the results 
obtained in the extraction phase of the background, as seen in the previous section. Thus, to distinguish 
moving objects from their shadows, we used the HSV color space where our algorithm is based on the 
following equation: 
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where F (x,y) represents the foreground of the component V (brightness). Consequently, a point (x; y) is 
classified as a shadow when it satisfies the following property: the inverse of the foreground F (x, y)y of the 


component V respects an upper limit to a dynamic threshold A whose value will be determined later. 
B 
This equation is derived from the fact that the presence of the shadow in an area often results in a 
significant change in brightness with no great modification of the color information. This is why we require 
that the inverse of the background should be compared to a threshold 4 (with 0< 4 <1). The first factor A 


B B 
considers the degree of the power of the shadow (the lower value of A is the more darkened are the shadows 
of the covered objects), while the second factor B is used to increase the robustness to noise (the brightness of 
the current image can not be too close to that of the background). 


3.4.2. Dynamic and automatic thresholding 
The calculation of the dynamic threshold for classifying points as shadows or non-shadows requires 
determining the factors A and B. For this purpose, the procedure given below should be followed: 
a. Once the absolute difference between the current image and the background in the HSV color space is 
measured, calculate the median MED and the median absolute deviation MAD in the mask of the 
foreground F detected in the previous section. So, the formulas used are: 


MED = Median D, and MAD = Median |D, , — MED 


X,YEFy, 


with D, =|F.,-—B,,| Where F , and B respectively represent the image and the background of the 


HSV color space. 
b. Then we adopt the same technique to optimize the calculation of the median for the V component where 
the median of the foreground and the background is calculated in the selection mask in the foreground F, 


with MED_Fv=Median F,, and MED_Bv=Median By y 
x, yeF, ? x, yeF, ? 


m m 


c. Finally, the values of A and B are calculated using the following expressions: 


MAD _ MED + MAD 


~ MED _ Fv MED Bv 


3.4.3. Improving the detection quality 

The elimination of shadows is not enough to allow for the exploitation of an image. Indeed, there 
are, in the majority of cases, isolated pixels that do not belong to the detected object. In addition, the effects 
of "holes" are easily noticeable on the detected objects. To make up for this limitation, and to subsequently 
provide a uniform result containing no noise, improving the quality of detection is generally required. 

To this end, we propose to have recourse to mathematical morphology operators, which are 
powerful tools. In our case, the objective is to rid the image of any isolated pixel. For this reason, an erosion 
operation is suitable. Note that this step can optionally be performed before the identification of the shadows. 
The following figure clearly shows the two operations that consist in detecting the object and removing its 
shadow. 


(a) (b) 


Figure 1. (a) Detecting object, (b) Removing shadow 
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4. EXPERIMENTS 

This section presents the results of the tests we undertook using the proposed method on a set of 
video sequences extracted from the PETS-ECCV'2004 videos - CAVIAR, PETS-2006 base and on a scene 
from the campus of the University of San Diego. The objective of these tests is to evaluate the performance 
of our method to detect moving objects while removing their shadows. 


4.1. Results for detecting moving objects 

The comparative tests carried out in this section aim to show the benefit of using the HSV color 
space in the detection phase. For this purpose, our detection method was compared with that developed in 
[38], [39], which is an adaptation of the MOG method. Note that in these tests, we were limited in our 
detection method to 3 Gaussians while using of the HSV color space or more precisely the V component of 
this space. For the method of MOG, we considered the case of 3 and 5 Gaussians using RGB colors space. 
For these three methods, we applied morphological operators to get rid of isolated pixels. The results 
obtained are shown in Figure 2 below. 


Figure 2. (a) the original image, (b) results of our proposed method, (c) results of the MOG method using 3 
Gaussians, (d) results of the MOG method using 5 Gaussians 


Thus, the results in Figure 2 show that the MOG method produces detection errors that might be 
either holes on the detected object or a poor detection of object in the scene caused by changes in brightness. 
The results also show that, our method is capable of detecting objects accurately by using only 3 Gaussians. 
It is noted that to improve the detection quality, the MOG method requires increasing the number of 
Gaussians used to 5, which generates an increase in terms of computation time. 


4.2. Results concerning the suppression of the shadow 

To test the performance of the proposed method with regard to the removal of the shadow, we 
applied it on various scenes and under different conditions. Figure 3 shows an example of the results obtained 
by considering the case of two very close objects as well the case including a change of brightness. The 
results show that our method based on dynamic thresholding using the HSV color space provides better 
results both when detecting objects and removing shadows. This confirms that its robustness versus changes 
in brightness. 


Int J Elec & Comp Eng, Vol. 8, No. 3, June 2018: 1513 — 1521 


Int J Elec & Comp Eng ISSN: 2088-8708 O 1519 


Figure 3. (a) and (b) results for removing the shadow, (c) and (d) tresults for shadow detection 


4.3. Comparison of the proposed method with other existing methods 

To evaluate the performance of our method, we compared it with two other existing methods of 
detecting objects by removing shadows where this comparison was made by taking into account the 
following parameters: the threshold type considered for shadow detection and response time. In this 
comparative study, we used three methods for detecting and removing shadows. The first method is the one 
used implemented in the OpenCV library and uses the techniques proposed in [38]-[40]. This method is used 
while considering the HSV space. The second method is the one proposed by Cucchiara et al. [12]. The third 
is the method proposed in the present work. 

An evaluation of the above mentioned methods in terms of detection and removal of shadows was 
performed on video sequences in the HSV color space. The thresholds used to remove the shadow are static 
for the first and second methods. For example, for the first method the parameter r is set to 0.5 while for the 
Cucchiara method, we used the same parameter values of œ and g given in [4], that is to say 0.4 for œ and 


0.6 for 8. On the other hand the method proposed in this paper uses a dynamic thresholding calculated using 
the approach presented in Section 3. The results are shown in Figure 4 below. 


Figure 4. (a) the original image, (b) the results obtained with our method, (c) the results obtained with the 
method implemented in the OpenCV library and (d) the results obtained with the method of Cucchiara 
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Analysis of the results in Figure 4 shows that the proposed method is more efficient for the detection 
and removal of the shadows compared to the other two methods. Indeed, the method implemented in 
OpenCV and that of Cucchiara produce errors for the elimination of shadows (these errors are surrounded by 
a green rectangle in Figure 4), which is the case for our method since it is able to properly remove shadows in 
the scene. 


5. CONCLUSION 

In this paper, we proposed a new method to extract objects of interest from video sequences while 
removing their shadows. For the detection phase, our method is based on modeling background so as to 
enable the classification of pixels between the background and foreground. For this purpose, we proposed to 
apply an adaptation of the MOG method where the foreground is extracted using information from the 
intensity component (V) of HSV color space while updating the model of the background to take into 
account its potential variations. 

The presence of the shadow in the image results in perturbations in relation to the extraction model 
of objects. To detect shadows that are to be removed the elimination phase of shadows for the proposed 
method is based on the HSV color space using a dynamic threshold. The implementation of this method on 
video sequences has shown its proper functioning as it allows to properly extracts moving objects while 
removing their shadows. 
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